151 research outputs found
I Can See Your Aim: Estimating User Attention From Gaze For Handheld Robot Collaboration
This paper explores the estimation of user attention in the setting of a
cooperative handheld robot: a robot designed to behave as a handheld tool but
that has levels of task knowledge. We use a tool-mounted gaze tracking system,
which, after modelling via a pilot study, we use as a proxy for estimating the
attention of the user. This information is then used for cooperation with users
in a task of selecting and engaging with objects on a dynamic screen. Via a
video game setup, we test various degrees of robot autonomy from fully
autonomous, where the robot knows what it has to do and acts, to no autonomy
where the user is in full control of the task. Our results measure performance
and subjective metrics and show how the attention model benefits the
interaction and preference of users.Comment: this is a corrected version of the one that was published at IROS
201
Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination
We present a method for assessing skill from video, applicable to a variety
of tasks, ranging from surgery to drawing and rolling pizza dough. We formulate
the problem as pairwise (who's better?) and overall (who's best?) ranking of
video collections, using supervised deep ranking. We propose a novel loss
function that learns discriminative features when a pair of videos exhibit
variance in skill, and learns shared features when a pair of videos exhibit
comparable skill levels. Results demonstrate our method is applicable across
tasks, with the percentage of correctly ordered pairs of videos ranging from
70% to 83% for four datasets. We demonstrate the robustness of our approach via
sensitivity analysis of its parameters. We see this work as effort toward the
automated organization of how-to video collections and overall, generic skill
determination in video.Comment: CVPR 201
Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video
Manual annotations of temporal bounds for object interactions (i.e. start and
end times) are typical training input to recognition, localization and
detection algorithms. For three publicly available egocentric datasets, we
uncover inconsistencies in ground truth temporal bounds within and across
annotators and datasets. We systematically assess the robustness of
state-of-the-art approaches to changes in labeled temporal bounds, for object
interaction recognition. As boundaries are trespassed, a drop of up to 10% is
observed for both Improved Dense Trajectories and Two-Stream Convolutional
Neural Network.
We demonstrate that such disagreement stems from a limited understanding of
the distinct phases of an action, and propose annotating based on the Rubicon
Boundaries, inspired by a similarly named cognitive model, for consistent
temporal bounds of object interactions. Evaluated on a public dataset, we
report a 4% increase in overall accuracy, and an increase in accuracy for 55%
of classes when Rubicon Boundaries are used for temporal annotations.Comment: ICCV 201
- …